Joint modeling of forced vital capacity measures with time to onset of polycythemia among chronic obstructive pulmonary outpatients follows‐up: A case of University of Gondar Referral Hospital

Abstract Background and Aims Chronic obstructive pulmonary disease (COPD) causes airflow obstruction and respiratory problems. Thus, the main objective of this study was to determine the risk factors for the progression of COPD using longitudinally measured forced vital capacity with time to onset of polycythemia outpatients follow‐up. Methods A retrospective study design was used to gather the related data on longitudinal change of forced vital capacity and time to onset of polycythemia from the medical charts. The joint model consists of a longitudinal submodel for the change of forced vital capacity and a survival submodel for the time to onset of polycythemia of chronic obstructive pulmonary patients. Results From the total of 266 patient's estimated value of forced vital capacity of chronic obstructive pulmonary patients was 74.45 years with a standard deviation of 8.59. The estimated value of the association parameter was −0.006, which indicates that the lower value for a forced vital capacity measure was associated with the higher risk of polycythemia and vice versa “Based on the joint model analysis found that the predictor smoking, comorbidities, marital status, weight, and HIV” jointly affected the two responses, which are change of forced vital capacity and time to onset of polycythemia among chronic obstructive pulmonary patients. Conclusion The overall performance of separate and joint models, joint modeling of longitudinal measures with the time‐to‐event outcome was the best model due to smaller standard errors and statistical significance of both the association parameters.

Chronic obstructive pulmonary disease (COPD) is a condition that can be prevented and treated and is characterized by difficulty breathing that isn't entirely curable.COPD is the third leading motive of death globally, affecting 3.23 million deaths in 2019. 1 Globally, the maximum typically encountered chance aspect for COPD is tobacco smoking.Other types of tobacco and marijuana also are chance elements for COPD.Outside, occupational, and indoor air pollutants-the latteras a result of the burning of biomass fuels are the major principal COPD risk factors. 2 Noncommunicable diseases and chronic respiratory disease, the key cause of death and morbidity globally.The most recent suggestion is accessible from the Global Burden of Disease Study 2017 reports.Around 3.2 million deaths due to (COPD) and 495,000 deaths due to asthma. 3e incidence of COPD is increase due to urbanization, industrial pollution, tanneries, and biomass fuel burning inside homes, particularly in Asian and African countries.The prevalence COPD in Africa is 13.4%, ranging from 9.4% to 22.1%.But in Asia, its prevalence was reported at 13.5% with a range of 3%−22.2%.Additionally, COPD is a common cause for hospital admission in many countries, playing a crucial role in imposing healthcare costs. 4me study showed that the occurrence of COPD in sub-Saharan Africa has been decreased.But, COPD has turned out to be a growing health problem in sub-Saharan Africa since of tobacco smoking and the publicity of biomass fuels.In the most worldwide location of sub-Saharan Africa, 90% of agricultural household determined by on biomass gasoline for cooking and heating, determine younger kids (severe decreased breathing infections) and ladies (COPD).It is the reason of large death and illness in the vicinity. 5me of understanding the relationship between infected with HIV and COPD in sub-Saharan Africa.We evaluated the occurrence and studied the risk factors of COPD consistent with HIV status in a reference center for HIV and tuberculosis control in Cameroon. 6ound 63.3% of the contributors with COPD provided with cough as the primary respiration sign.Nearly 55.5% of men patients with COPD and 45.6% of women patients with COPD had a cough.
Breathing signs of cough, phlegm, wheezing, and rapidity of breath had been significantly more usual in patients with COPD than people who no longer have COPD. 7e report discussed above shows that up-to-date assessment of the evidence concerning COPD in Ethiopia is very low in the documentation at in country level.COPD is increasing from time to time and that affects developing countries like Ethiopia.The majority of population deaths, according to the experts, are brought on by chronic obstructive lung disorders.This implies that further studies are needed to identify and evaluate factors or prevalent methods for the progression of forced vital capacity with time to onset of polycythemia among chronic obstructive patients under the followup.The prevalence of COPD increases with age and people are not normally recognized till they may be 50 years of age or older.It is far extra usual in male than female and men are much more likely to die from it.Mortality prices are larger in Scotland and the North of Britain than in the South, reflecting the fact that the prevalence and its consequence are nearly two times as high within the most deprived 20% of the population. 8is finding observed that there is an increasing progression of COPD in our country.The researcher was find out a way of handling to minimize and guide the prevalence of the diseases by using repeated measurement of forced vital capacity with the onset of polycythemia due to the follow-up among COPD.An investigation was conducted based on a cross-sectional study design.This study design does not necessarily show the prevalence of disease over time.They used logistic regression to determine the risk influences without allowing for the relationships within the many outcomes and subject-specific random effects.In this investigation, such kind of problem was solved by considering the correlated and missing data in the longitudinal submodel.
The linear mixed model for longitudinal forced vital capacity and the Cox proportional hazard (PH) model for to time to onset of polycythemia data do not study dependences or relationships between these two different data types (longitudinal and time-to-event data) independently.
Therefore, the researcher had an increasing interest in joint modeling of longitudinal forced vital capacity measure with time to onset of polycythemia in chronic obstructive pulmonary outpatients.Then it minimizes unfairness in parameter approximation and increase the adequacy of statistical inference.Therefore, the main objective of this investigation was to determine the determinant risk factors for the progression of COPD using longitudinally measured forced vital capacity with time to onset of polycythemia outpatients follow-up.

| Study area
The investigation has conducted at University of Gondar Referral Hospital.University of Gondar Referral Hospital is one of the oldest institutions of in Ethiopia.It has been producing a number of health workers of science more than half of a century ago.The university is situated at the center of Gondar city found in Amahara Region, North West of Ethiopia.The hospital provides different inpatient and outpatient services to the population in the surrounding area of Gondar town and the nearly by Woredas and Zones.Gondar town is the capital city of Central Gondar zone and located 727 km away from Addis Ababa in the northwest of Ethiopia, Amahara Region.

| Data source and study design
For this research, a secondary source of data was considered.The study designs for this study were an observational retrospective study.The patient's chart using a checklist designed by the researcher by considering available variables, which includes clinical and sociodemographic data on all the COPD outpatients during follow-up, was used to extract both longitudinal and survival data.
The study period was from February 1, 2019 to February 1, 2022, a total of 266 chronic obstructive pulmonary outpatients were obtained in the University of Gondar Referral Hospital.The data for responses and covariates were collected with the support of the healthcare service.In this study, chronic obstructive pulmonary outpatients represent the number of patients who follows the clinical treatment up to the discharge date, and or who either leave the hospital by any means or transfer from the hospital to the other hospital or die before completing the treatment.

| Inclusion criteria
The study population was all chronic obstructive pulmonary outpatients who have two or more visits and patients measured whose, forced vital capacity in the follow-up were included in the study period.

| Exclusion criteria
Chronic obstructive pulmonary outpatients who have only single information and patients not measure the forced vital capacity were excluded from the study.

| Independent variable
The explanatory variable for this investigation were sociodemographic or clinical characteristics that are expected to be related to repeated measurements of forced vital capacity with time to onset of polycythemia in chronic obstructive pulmonary outpatients during treatment shown in Table 1 below.

| Methods of data analysis
In this study, there three types of different statistical models were

| Survival data analysis
The response variable of interest in survival analysis is the amount of time until an event happens.The term "survival analysis" refers to methods where the data being examined is the amount of time it takes for a particular event of interest to occur.As opposed to the use of other statistical techniques, it is most significant when there exist censoring data.It entails the modeling and analysis of data, with the time until polycythemia occurs among COPD as the primary end point (time-to-event data).Time is defined as the number of months or observational time in month between the start of a person's follow-up and the occurrence of an event. 139 | Joint model analysis for longitudinal and survival data The join consists of two linked submodels, the measurement model for the longitudinal process (forced vital capacity), and the to event (time to onset of polycythemia) model for the survival model for the survival process.The joint modeling approach used to obtain the less bias and more efficient inference. 14| RESULT AND DISCUSSION

| Descriptive analysis
The baseline sociodemographic and clinical traits of the patent of the patient participants enrolled in the analysis are shown in Table 2 below.
From the result among 266 COPD patients treated at the University of Gondar from February  Results from the above Table 6 showed that the smoking status of the patient, presence of related diseases (comorbidities), HIV, education status of patients, and weight was positive statistically significant predictors for longitudinal change of forced vital capacity.
However, time visit was negatively associated for the longitudinal change of forced vital capacity among COPD outpatients at.On the other hand, marital status, age, occupation, sex, and residence of COPD patients were not significant.Moreover, from the random effect estimates table, the estimated subject-specific variability was statistically significantly at 95% confidence level.The statistical significance of this parameter supports the assumption of heterogeneous variances for the repeated measurement data.The above Table 7 provided that comorbidities, sex, education, smoking, marital status, and HIV COPD patients had a statistically significant difference in the survival experience of these patients (to onset polycythemia) between their different categories at 5% of the level of significance.On the other hand, no statistically significant difference was observed in the survival experience of patients based on their residence and occupation.

| Cox PH assumption
The PH assumption asserts that the hazards ratios are constant over time.
That means the risk of failure must be the same no matter how long the subjects have been followed.To test this assumption the GLOBAL test and schonfield residuals were performed.

| Joint modeling analysis for longitudinal and survival data
Results from Table 11 showed that comorbidities, HIV, smoking, and  had primary educated was significantly higher by 3.001% compared to the average forced vital capacity that had no educated among COPD patients.

| Interpretation of the results
In the survival, the estimated hazard ratio (HR) of polycythemia for patient's weight was exp (−0.0154) = 0.98, implying that for a unit a unit increment in the weight of patients, the hazard of polycythemia among COPD patients was significantly decreased by 2% keeping all other variables constant.The estimated HR of polycythemia for patients of married marital status was exp (0.2560) = 0.11, implying that a married marital status of polycythemia is 89% times less likely to develop polycythemia as compared to single marital status among COPD outpatients keeping all variables constant.The estimated HR of polycythemia patients of sex was exp (−0.3166) = 0.73, implying that male patients of polycythemia are 27% times less likely reduction to as compared to a female of polycythemia among COPD outpatients keeping all variables constant.
The estimated HR of polycythemia for patients who were  The estimated association parameter was α = −0.006,indicating that there is a negative association between the forced vital capacity and time to onset of polycythemia among COPD outpatients.The result indicated that the lower value for a forced vital capacity measure was associated with a higher risk of polycythemia and vice versa.

| DISCUSSION
In this study, the association parameters was statistically significant in the joint model, this was an indicator of the correlation between the two responses and showed that the joint model was a better fitto the data than the separate models.This finding was consistent with another study done by Long and Mills, a study conducted by a joint model trained in a single had very good performance in discriminating among diagnosed and prediagnosed participants in the remaining test studies, which concluded that joint modeling is an improvement over traditional survival modeling because it considering all the longitudinal observation od covariates that are predictive of an event. 15V is a very important clinical predictor variable for a longitudinal measure of forced vital capacity and time to onset of polycythemia among COPD patients.The estimated HR of patients with HIV infection of polycythemia was exp (0.3494) = 1.4182, implying that patients with HIV infection of polycythemia are 41.82% times more likely to develop as compared to patients without HIV infection of polycythemia among COPD outpatients keeping all variables constant.This study confirms the study conducted by Pefura-Yone et al.
Among patients with HIV infection in this setting and who for many have a history of pulmonary tuberculosis, the presence of the chronic respiratory symptoms and other determinants identified in this study should trigger specific investigation for a possible underlying COPD.Such a proactive approach were help optimizing the care of those patients. 6om the final selected model; sex, weight, smoking status, marital status, presence of related disease, patients with HIV, and occupation of patients were found to be statistically significant effects for the survival submodel.To interpret the result the HR was calculated in the survival submodels.The study by Zhang et al. revealed that the event of males was 3.60 more likely as compared to females In our study also patient's sex was found to be statistically significant with time to polycythemia and this indicates that the HR of males was (1−0.73 = 0.27) 27% times less likely as compared to females.The result was contradicted with the study done by Zhang et al.Additionally, this reveals that females were suffered to onset of polycythemia due to traditional cooking food in this study.The patient's weight was found to be a statistically significant effect with time to polycythemia and this indicates that there was a 2% increase in the expected hazard to a one unit increase in weight by adjusting the other covariates constant.The HR of smoking status was (HR = 0.76.When it is compared the hazard of smokers with none smokers, the hazard of smokers was high as compared to none smoker COPD patients.This study coincides with a study by. 16 the longitudinal submodel, the predictor variable like marital status, baseline age, education, comorbidities, HIV, and smoking was found to be a statistically significant effect with forced vital capacity. According to the result, the average marital status of forced vital capacity for COPD patients whose marital status was married is 89% times less than developed polycythemia as compared to single marital status.When the change of forced vital capacity level increased by one unit, the weight of patients increased by 0.0618%.This result is also in line with the findings of Wang et al.Advanced age (>60 years old) was identified as the most important risk factors for COPD (OR = 3.3).For people aged 40−59, smoking was the most the important risk factor for COPD (OR = 2.7).Among people aged 40−59, those aged 54 or, with a BMI a with a BMI of less than 23 kg/m 2 , and a smoking history of more than 33 pack-years smoking history of more than 33 pack-years had the highest prevalence of COPD (37.5%). 17The average forced vital capacity of COPD patients who smoke a cigarette was 2.5119 times more likely as compared to patients who have no habit of smoking cigarettes.This study was supported by 16 that is, the patients who have the habit of smoking are 2.55 times more likely to have the risk as compared to patients who have no smoking habit. 16Almost all patients (98.8%) were smokers as compared to the non-smokers among COPD patients.
When the age of patients increased by 1 year the change of forced vital capacity of COPD patients significantly increased by 0.0928%. 16is study lined with a study by Lee et al.The study found that the age increased unit change the COPD patients increased by an estimated 65%. 18 The average forced vital capacity of COPD patients who had related diseases (comorbidities) was significantly higher by 2.5186 compared to the average forced vital capacity of COPD patients who had no related diseases.The most frequently associated morbidities were arterial hypertension (59.5%), dyslipidemia (54.3%), and type 2 diabetes mellitus (31.2%); 32% of the patients suffered heart disease.
There is a high prevalence of active smoking, type 2 diabetes mellitus, and heart disease in patients referred for COPD to Canary Island pneumology outpatient services.This finding lined with the other study by. 19

| CONCLUSION
In this study, the results of both separate and joint analyses were displayed.But, the use of the use of a joint model analysis compared to a separate model analysis adjusted for the correlation between the two responses presented a significant reduction in the standard errors and then provides more efficient inferences.When the overall performance of the separate and joint models was compared in terms of model parsimony and goodness of fit, the joint model performed better based on its significant likelihood ratio test.This means the joint modeling can benefit the analyses of both repeated measures of forced vital capacity with time to onset of polycythemia among chronic obstructive pulmonary outpatients.
The longitudinal submodel under the joint modeling analysis showed that age, education, comorbidities, marital status, HIV, smoking, observation time, and weight were significantly associated with the change of forced vital capacity.And also the survival submodel under the joint modeling analysis showed that marital status, sex, smoking, comorbidities, weight, HIV, and occupation were statistically significant factors for the time to onset of polycythemia among COPD patients.Additionally, the association parameter (the effect of the real unobserved longitudinal change of forced vital capacity) was also statistically significant for the onset of polycythemia in COPD patients.
Results from the joint model analysis revealed that marital status, comorbidities, smoking, HIV, and weight were significantly associated with the two responses (a repeated measure of forced vital capacity and time to onset of polycythemia) of COPD patients.
The joint model performed best overall, with less variability, the more statistical significance of the association parameters, and greater goodness of fit than the standalone model.The combined model was found to be preferable for the simultaneous analysis of longitudinal measurement and survival data, according to the authors.

2. 4 |
Study variables 2.4.1 | Outcome variables Two outcome variables were considered, these are: The longitudinal measures outcome: progression of forced vital capacity measured with a spirometer approximately every 3 months.It is a continuous variable.The survival outcome: time to onset of polycythemia for chronic obstructive pulmonary outpatients (0 = censored & 1 = event).

1 2. 6 | 11 2. 7 |
applied; the survival model to examine the determinate risk factors that affect survival time to onset of polycythemia and longitudinal model analysis was used to determinate risk factors that affect the longitudinal change of forced vital capacity separately.The joint model consists of longitudinal submodel for the change of forced vital capacity and survival submodel for the time to onset of polycythemia of chronic obstructive pulmonary patients.The data were checked, cleaned, coded, entered, and analyzed by using SPSS version 20 and R version 4.3.0software.Bivariate logistic regression was performed to identify the potential candidate variable and each variable with a p-value less than 0.05 were candidate into a multiple logistic regression analysis to determine the factors significantly associated with the Hepatitis B virus.Finally, variables with p-values less than 0.05 in the multivariable logistic regression model were taken as statistically significantly. 9T A B L E 1 Independent variable.patients 0 = No educated, 1 = primary educated, 2 = secondary educated, 3 = diploma and above 10 Comorbidities Presence of related diseases 0 = No, yes = 1 11 HIV HIV status of patients No = 0, yes = Exploratory data analysis Explanatory data analysis can serve to determine as much of the information about raw data as likely, plotting each graphs to prudently determine the data should be achieved first before any formal model fitting is carried out.Hence, this investigation is explored the data by using descriptive statistics and profile plot of FVC over the period of the study and asses the nature of the data by discovering individuals profile and the average progression.The single profile graphs and the variance structure are used to improvement understanding of the variability in the data and to control which random effects to be measured in the linear the linear mixed model.The mean structure was used to gain intuition on the time function that can be used to model the data. 10Furthermore, the Kaplan−Meier estimator was used to estimation and graph survival probabilities as a function of time and the observed difference between survival probabilities of predictor variables were tested using log-rang test.Linear mixed model for longitudinal data Multiple observations of the same subject across time give rise to the longitudinal data analysis.When measurements are done on the similar subject at several times and when quantities are taken on related topics, longitudinal response data may arise.In both cases, the results (responses) variables are likely to be associated that is, the measuring forced vital capacity repeatedly through the duration of the study, would introduce linear mixed effect model for the analysis of continuous longitudinal responses.Whereas the longitudinal modeling between specific subject variations on forced vital capacity was completed to understand difference among individuals, the continuous model inside subjects differences were employed to analyze change over time. 12LMM is extending from classical linear regression model that takes in to account both fixed effect random effect terms.The random effect contains subject specific random effect and the fixed effect contains the set of predictors that are fixed across the subject or the same for all subjects.The fixed effect factors in the Linear mixed model reflect the population-wide associations between the predictions and the forced vital capacity.Because random effects are specific to individuals within a population, they are directly employed to simulate the unpredictable variation in the forced vital capacity across various data levels.

Figure 1
Figure 1 showed that the Kaplan−Meier survival curves for each study variable provide an initial insight into the shape of the survival function.The parts of the plot revealed the overall survival probability of COPD patients and it was a nonincreasing step function with the corresponding increment in the survival time.

Figure 2
Figure 2 displayed the time to onset of polycythemia of COPD patients by categorical variables in the study.The Kaplan−Meier educational status were positively associated with the longitudinal change of forced vital capacity.However, marital status, weight, and visit time were negatively associated with the longitudinal change of forced vital capacity in the longitudinal submodel.The result also shows that smoking, comorbidities, and HIV were positively associated with the time to onset of polycythemia.However, marital status, sex of patients, occupation, and weight were negatively associated with the time to onset of polycythemia among COPD patients in the survival submodel.Therefore smoking, comorbidities, marital status, weight, and HIV jointly affected the two responses, which are change of FVC and time to onset of polycythemia among COPD patients.There is evidence of a relationship between the forced vital capacity and the onset of polycythemia in patients with COPD, based on an estimate of the association parameter (α) is negative (−0.006) indicating that forced vital capacity is negatively associated with time to onset of polycythemia of patients from COPD treatments.This means that a decreasing trend in the forced vital capacity would be increasing the risk of polycythemia patients among COPD treatments.
merchant occupation was exp (−0.3198) = 0.73, implying that for a merchant occupation of polycythemia is 27 times less likely reduction to occur as compared to farmers of polycythemia among COPD outpatients keeping all variables constant.In addition, the estimated HR of polycythemia for patients who were tasked in another field of work was exp (−0.03643) = 0.69, implying that for another field of occupation polycythemia is 31% times less likely reduction to occur as compared to farmers occupation of polycythemia among COPD outpatients keeping all variables constant.The estimated HR of polycythemia patients who had related disease was exp (0.2103) = 1.2334, implying that the presence of related disease of polycythemia is 23.34% times more likely to develop polycythemia as compared to the absence of related disease among COPD outpatients keeping all variables constant.In addition, the estimated HR of polycythemia patients with HIV infection was exp (0.3494) = 1.4182, implying that the presence of HIV-infected T A B L E 11 Parameter estimates of joint modeling.

Table 4
Summary statistics for independent variables.
displays that the null model was the model fitted without covariates whereas the full model was the model fitted with all covariates considered for the model.Therefore, the full model was a better fit for the data due to the small AIC and BIC.Based on Table5univariable analysis the variable follow-up visiting time of the patient, smoking habit, the palace of residence, marital status, HIV, education, weight, related disease (comorbidities), and sex were predictors variables which are significant at 25% level in the univariable analysis can be candidates for multivariable statistical analysis.T A B L E 2 T A B L E 3 Selection of random effects.T A B L E 4 Linear mixed model comparisons.

Table 8
Univariable analysis for linear mixed effects model.Parameter estimates for the final linear mixed effects model.
, it was clear to see that the p-value of GLOBAL test was statistically insignificant.This indicated that the PH assumption was not violated.TA B L E 5 0i , b 1i ) −0.54 Abbreviation: ref, reference category of categorical variables.F I G U R E 1 The overall estimate of Kaplan−Meier survival function plot of chronic obstructive pulmonary disease patients.From the above Table 9, age of patient, place of residence, comorbidities, HIV status, education, occupation, smoking status, sex of patients, weight of patients, and marital status were statistically significant at 25% level of significance and taken as candidate variables for multivariable Cox PH model.On the other hand, no one was statistically insignificant predictors at 25% level of significance for the univariable Cox PH model.Results from Table 10 revealed sex of patient, comorbidities (related diseases), HIV, smoking, occupation, weight, and marital status were positively statistical significant associated predictors for time to onset of polycythemia.However, age of patients was negatively associated with time to onset of polycythemia.On the other hand, residence of patient and educational status were not statistically significant predictors for the time to onset of polycythemia among COPD patients at 5% level of significance.

Table 11
Time to onset of polycythemia by categorical variables of COPD patients.COPD, chronic obstructive pulmonary disease; KM, Kaplan−Meier.Proportional hazards assumption.Univariable analysis for Cox proportional hazard model.Result of the final Cox PH model for the time to onset of polycythemia of COPD patients.
marital status was significantly lower by 2.85% compared to the single marital status of forced vital capacity among COPD patients by holding all other variables constant.The average forced vital capacity of smokers was significantly higher by 2.5119% as compared to the average forced vital capacity of nonsmokers among COPD patients keeping all other variables constant.InF I G U R E 2T A B L E 7 Log-rank test for each categorical variable in the study.addition, the average forced vital capacity with HIV infected was significantly higher by 5.1488% as compared to the average forced vital capacity free from HIV infected among COPD patients keeping all other variables constant.The average forced vital capacity who had related diseases (comorbidities) was significantly higher by 2.5186 compared to the average forced vital capacity for that who had no related diseases among COPD patients, keeping all other variables constant.In addition, the average forced vital capacity thatT A B L E 9 T A B L E 10Abbreviations: COPD, chronic obstructive pulmonary disease; ref, reference category for categorical predictors; SE, standard error.